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Abstract. We show a new lower bound for the maximum number of 
runs in a string. We prove that for any e > 0, (a — e)n is an asymptotic 
lower bound, where a — 56733/60064 ~ 0.944542. It is superior to the 
previous bound 3/(l + \/5) ~ 0.927 given by Franek et al. [1, 2]. Moreover, 
our construction of the strings and the proof is much simpler than theirs. 
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1 Introduction 



^- ' Repetitions in strings is an important element in the analysis and processing of 

strings. It was shown in [3] that when considering maximal repetitions, or runs, 
the maximum number of runs p(n) in any string of length n is 0(n), leading to a 
linear time algorithm for computing all the runs in a string. Although they were 
not able to give bounds for the constant factor, there have been several works 
to this end [4-6]. The currently known best upper bound 3 is p(n) < 1.048n, 
qq ' obtained by calculations based on the proof technique of [6]. The technique 

f^i , bounds the number of runs for each string by considering runs in two parts: 

runs with long periods, and runs with short periods. The former is more sparse 
and easier to bound while the latter is bounded by an exhaustive calculation 
concerning how runs of different periods can overlap in an interval of some length. 
On the other hand, an asymptotic lower bound on p(n) is presented in [2], where 
it is shown that for any e > 0, there exists an integer N > such that for any 
n > N, p{n) > (a — e)n, where a = — ^-y= « 0.927. It was conjectured in [1] that 

1 + V o 

this bound is optimal. 

In this paper, we prove that the conjecture was false, by showing a new lower 
bound a = 56733/60064 « 0.944542. First we show a concrete string r of length 
60064, which contains 56714 runs in it. It immediately disproves the conjecture, 
since 56714/60064 w 0.944226 is already higher than the previous bound 0.927. 
Then we prove that the string r k , which is the string obtained by concatenating 
k copies of r, contains 56733fc — 18 runs for any k > 2. Since \r k \ — 60064/c, it 
yields the new lower bound 56733/60064 as k — > oo. 

3 Presented on the website http://www.csd.uwo.ca/faculty/ilie/runs.html 



2 Preliminaries 

Let S be a finite set of symbols, called an alphabet. Strings x, y and z are said to 
be a prefix, substring, and suffix of the string w = xyz, respectively. The length 
of a string w is denoted by |w|. The z-th symbol of a string w is denoted by w[i] 
for 1 < i < \w\, and the substring of w that begins at position i and ends at 
position j is denoted by w[i : j] for 1 < i < j '• < \w\. A string w has period p if 
w[z] = w[i + p] for 1 < i < |M — p. A string w is called primitive if w cannot be 
written as u k , where k is a positive integer, k > 2. 

A string m is a nm if it is periodic with (minimum) period p < \u\/2. A 
substring u = w[i : j] of w is a r«n in w if it is a run of period p and neither 
w[i — 1 : j] nor w[i : j + 1] is a run of period p, that means the run is maximal. 
We denote the run u = w[i : j] in w by the triple (i,j — i + l,p) consisting of 
the begin position i, the length |u|, and the minimum period p of u. A run of w 
which is a prefix (resp. suffix) of w is called a prefix (resp. suffix) run of w, For 
a string w, we denote by run(w) the number of runs in w. 

For example, the string aabaabaaaacaacac contains the following 7 runs: 
(1,2,1) = a 2 , (4,2,1) = a 2 , (7,4,1) = *\ (12,2,1) - a 2 , (13,4,2) = (ac) 2 , 
(1,8,3) = (aab) 3 7 and (9,7,3) = (aac) 3 . Thus rwi(aabaabaaaacaacac) = 7. 

We are interested in the behavior of the maxrun function defined by 

p(n) = max{ren(w) | w is a string of length n}. 

Franek, Simpson and Smyth [1] showed a beautiful construction of a series of 
strings which contains many runs, and later Franek and Qian Yang [2] formally 
proved a family of true asymptotic lower bounds arbitrarily close to ,- n as 
follows. 

Theorem 1 ([2]). For any e > there exists a positive integer N so that 
p( n ) > ( — ^-7= — s\n for any n > N . 

3 Basic Properties 

In this section, we summarize some basic properties concerning periods and 
repetitions in strings, which will be utilized in the sequel. 

The next Lemma given by Fine and Wilf [7] provides an important property 
on periods of a string. 

Lemma 1 (Periodicity Lemma (see [8,9])). Let p and q be two periods of 
a string w. If p + q — gcd(p, q) < \w\, then gcd(p, q) is also a period of w. 

For a string w, let us consider a series of strings w, w 2 , w 3 , w A . . ., and observe 
all runs contained in these strings. There are many cases, which confuse the task 
of counting the number of runs in these strings. 

1. A run in w k which is neither a suffix nor prefix run of w k is also a run in 
w k+1 . 



2. A suffix run in w k and a prefix run in w may be merged into one run in 
w k+1 . 

3. A suffix run in w k may be extended to a run in w k+1 . 

4. A new run may be newly created at the border between w k+1 and w. 

Concerning case 4, note that a new run that did not appear in w or w 2 may be 
created in w 3 . For example, consider strings w = abcacabc, and r = (cabca) 2 . 
We can verify that r is a run (8, 10, 5) of w 3 = abcacab cabcacabcab cacabc, 
while r does not appear in w 2 = abcacabcabcacabc. Moreover, the same ar- 
gument holds also for binary alphabet 0, 1; Replace a, b, c into 01, 10, 00, 
respectively in the above example. 

However, the following lemma shows that the length of such new runs can 
be bounded. 

Lemma 2. Let w be a string of length n. For any k > 3, let r = (i,l,p) be a 
run in w k . If I > 2n, then i — 1 and I — kn, that is, r = w k . 

Proof. We assume that n > 1, since it is trivial for the case n = 1. Since p is 
the minimum period of the run r, we know \r\ = I > 2p and I > 2n . Let u 
be a primitive string of length to where w — u for some integer t > 1. Then, 
\u\ = m < n is also a period of run r. Since p + to < I , Lemma 1 claims 
that gcd(p, to) is also a period of run r. If p > m, then gcd(p, m) < p, which 
contradicts the assumption that p is the minimum period of r. If p < to, then it 
contradicts the assumption that u is primitive. Therefore we have p — to. Since 
m is a period of w k , we have r = (1, fcn, to) = w k . 

This lets us prove the following lemma which gives a formula for run(w k ). 

Lemma 3. Let w be a string of length n. For any k > 2, run(w k ) = Ak — B, 
where A = run(w 3 ) — run(w 2 ) and B ~ 2run{w 3 ) — 3run(w 2 ). 

Proof. We think about the increase in the number of runs, when concatenating 
w k and w. Let r = (i, l,p) be a run of w k+1 such that i + I > nk + 1, that is, 
r ends somewhere in the last w of w k+1 . By Lemma 2, if i < (k — 2)n then 
r = w k+1 . In such a case, r does not increase the number of runs since the run 
will have already been considered in w 2 . Therefore, the increase in runs can be 
considered by restricting our attention to runs with i > (k — 2)n, that is, the 
increase in runs for the last 3 w's of w k+1 when concatenating w to the last 2 
w's of w k . This gives us run{w k+1 ) — run(w k ) = run(w 3 ) — run{w 2 ). 



run(w ') = run(w ~ ) + run{w ) — run(w ) 

= run(w ~ ) + 2(run(w ) — run(w )) 

= run(w ) + (k — 2)(run(w ) — run(w )) 

= k(run(w ) — run(w )) — (2run(w ) — irun(w )) 

for k > 3. It is easy to sec that the equation also holds for k = 2. 



Theorem 2. For any string w and any e > 0, there exists a positive integer N 
such that for any n > N , 

p{n) run(w 3 ) — run{w 2 ) 



Proof. By Lemma 3, run{w k ) = Ak — B 7 where A = run(w 3 ) — run{w 2 ) and 
B = 2run(w 3 ) — 3run(w 2 ). 

For any given e > 0, we choose N > A ~ B . For any n > N 1 let k be the 
integer satisfying |w|(fc — 1) < n < \w\k. Notice that k > t^t > J^ > 4^. 
Since p(i + 1) > p(i) for any «, and | xt7 fc x | = \w\(k — 1), 

p(n) p(\w\(k-l)) run(w k - v ) _ A(k - 1) - B _ Ak - A- B 

n \w\k \w\k \w\k \w\k 

A A-B A 

\w\ \w\k \w\ 

a 



4 New Lower Bounds 

We found some strings which contain many runs, by running a computer program 
which utilizes a simple heuristic search for run-rich binary strings. Given a buffer 
size, the search first starts with the single string in the buffer. At each round, 
two new strings are created from each string in the buffer by appending or 1 
to the string. The new strings are then sorted in order of run(w 3 ) — run(w 2 ), 
and only those that fit in the buffer are retained for the next round. Strings that 
give a high ratio of runs are recorded. 

We tried several variations of the algorithm, and found many run-rich strings. 
Among these strings found so far, the string t, lets us prove the currently best 
lower bound on the maximum number of runs in a string. Since r is too long to 
include in the paper, we will make r available on our web site 4 . Once we have 
r, it is straightforward to confirm that the following lemma holds. Any naive 
program to count runs in a string would be sufficient. 

Lemma 4. There exists a string r such that \t\ — 60064, run(r) — 56714, 
run(r 2 ) = LL3448, and run{r 3 ) = L70L8L. 

Lt immediately disproves the conjecture, since 56714/60064 « 0.944226 is 
already higher than the previous bound — ^-y= « 0.927. We now show the main 
result of this paper. 

Theorem 3. For any e > there exists a positive integer N so that 
p(n) > (a- e)n for any n> N, where a = |^|| w 0.944542. 



http: //www. shino. ecei .tohoku. ac . jp/runs/ 



Proof. From Theorem 2 and Lemma 4, we have 

p(n) 170181 - 113448 _ 56733 
~V > 60064 £ ~ 60064 ~ £ ' 



□ 



For proof of concept, we present in the Appendix, a shorter string T1558 with 
|ti55s| = 1558,rwi(Ti 558 ) = 1445, rwnfjfssg) = 2915, ron(rf 558 ) = 4374 that 
gives a smaller bound (4374 — 2915)/1558 « 0.93645 compared to r, but is still 
better than previously known. 

5 Conclusion 

We presented a new lower bound 56733/60064 s=a 0.944542 for the maximum 
number of runs in a string. The proof was very simple, once after we verified 
that the runs in the string r is 56714, and noticed some trivial properties of the 
string. We do not think that the bound is optimal. We believe that our work 
would revive the interests to push the lower bound higher up, since the previous 
bound 3/(1 + y/5) « 0.927 was conjectured to be the optimal since 2003. 
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Appendix 

The binary string T1558 with |ri55s| = 1558, rur^risss) = 1445, rurdrf 5bs ) = 
2915, ntn(rf 558 ) = 4374, giving lower bound (4374- 2915)/1558 w 0.93645 > 
0.927. 

110101101001011010110100101101011001101011010010110101101001011010 
110010110101101001011010110100101101011001101011010010110101101001 
011010110010110101101001011010110010110100101101011010010110101100 
101101011010010110101101001011010110010110100101101011010010110101 
100101101011010010110101100101101001011010110100101101011001011010 
110100101101011010010110101100101101011010010110101100101101001011 
010110100101101011001011010110100101101011010010110101100101101001 
011010110100101101011001011010110100101101011001011010010110101101 
001011010110010110101101001011010110100101101011001011010110100101 
101011001011010010110101101001011010110010110101101001011010110010 
110100101101011010010110101100101101011010010110101101001011010110 
010110100101101011010010110101100101101011010010110101100101101001 
011010110100101101011001011010110100101101011010010110101100101101 
011010010110101100101101001011010110100101101011001011010110100101 
101011010010110101100101101001011010110100101101011001011010110100 
101101011001011010010110101101001011010110010110101101001011010110 
100101101011001011010110100101101011001011010010110101101001011010 
110010110101101001011010110010110100101101011010010110101100101101 
011010010110101101001011010110010110100101101011010010110101100101 
101011010010110101100101101001011010110100101101011001011010110100 
101101011010010110101100101101011010010110101100101101001011010110 
100101101011001011010110100101101011010010110101100101101001011010 
110100101101011001011010110100101101011001011010010110101101001011 
0101100101101011010010110101101001011010 

By interpreting T1558 as a binary representation of an integer, it can be ex- 
pressed in hexagonal representation by: 

0x35A5AD2D66B4B5A5ACB5A5AD2D66B4B5A5ACB5A5ACB4B5A5ACB5A5AD2D65A5AD 
2D65AD2D65A5AD2D65AD2D696B2D696B2D2D696B2D696B4B59696B4B596B4B5969 
6B4B596B4B5A5ACB5A5ACB4B5A5ACB5A5ACB4B5A5ACB5A5AD2D65A5AD2D65AD2D6 
5A5AD2D65AD2D696B2D696B2D2D696B2D696B4B59696B4B596B4B59696B4B596B4 
B5A5ACB5A5ACB4B5A5ACB5A5ACB4B5A5ACB5A5AD2D65A5AD2D65AD2D65A5AD2D65 
AD2D696B2D696B2D2D696B2D696B4B59696B4B596B4B59696B4B596B4B5A5A 



